专利摘要:
A terminal apparatus is provided which does not need to prepare pitch or intonation information of a guide voice. The terminal device is delivered with content data composed of performance data composed of a performance event sequence and speech symbol data composed of speech symbols for each syllable in the lyrics accompanying the performance data. A musical note is reproduced from the performance data. A guide voice is synthesized based on the voice symbol data. By first reading the performance data and controlling the speech synthesis section, the property of the synthesized guide voice is changed in accordance with the performance data.
公开号:KR20030010696A
申请号:KR1020027016964
申请日:2001-06-11
公开日:2003-02-05
发明作者:사이토아키토시
申请人:야마하 가부시키가이샤;
IPC主号:
专利说明:

Terminal device, guide voice reproducing method and storage medium
[2] In a cellular phone system such as a personal digital cellular telecommunication system (PDC), which is known as a digital cellular system, or a personal handyphone system (PHS), the occupied frequency bandwidth is narrow and the data transmission rate is low. It is. For this reason, the voice signal for a call is transmitted by high efficiency compression coding. As one method of this high efficiency speech compression coding method, an analysis synthesis coding method using a speech synthesis model composed of a sound source model and a vocal tract model is known. The analytic synthesis coding schemes include MPC (Multi-Pulse Excited LPC) schemes and CELP (Code Excited LPC) schemes for vector quantization using codebooks. CELP schemes have been put to practical use in any kind of digital cellular scheme. .
[3] By the way, a karaoke system has been proposed that enables a user to perform karaoke, ie, sing along with the reproduced karaoke music, by reproducing karaoke music from the delivered karaoke data. Such karaoke systems are generally referred to as communication karaoke, and karaoke systems for delivering karaoke data to homes are also known. In such a karaoke system, the requested piece of music data, guide lyrics data displayed as a visual prompt on the screen, and image data serving as a background are provided as karaoke data as necessary. The user views the guide lyrics (visual prompts) reproduced from the delivered guide lyrics data and displayed on the screen, so that the song data is sung in accordance with the reproduced musical notes.
[4] However, when performing karaoke, there is a problem that it is difficult to perform karaoke in a state where the display cannot be seen because the singer sings while viewing the guide lyrics displayed on the display so that the color changes in accordance with the progress of the music. The case where the display cannot be seen is, for example, during operation, when there is no display to be displayed, when the display cannot be small, and the like.
[5] As a communication karaoke system which solves this, the communication karaoke system of Unexamined-Japanese-Patent No. 11-167392 is proposed. When the karaoke system delivers the karaoke data including the song data, the background image data, and the guide lyrics display data for use, the communication karaoke system attaches and transmits the lyrics data for reading. Upon receiving these data, the karaoke apparatus reproduces the karaoke music from the song data, and displays the guide lyrics in accordance with the progress of the karaoke music on the display which displays the background image based on the background image data. In addition, the synthesized voice according to the information of the accent, the sound strength, and the pitch included in the read lyrics data is synthesized and output in accordance with the read timing information contained in the lyrics data. . This allows karaoke to be performed by listening to the speech synthesized lyrics without looking at the display.
[6] However, the lyric data for reading must be read before the song is sung, and it is necessary to synthesize the voice with a pitch or intonation corresponding to the melody called the song so that it is easy to be sung when heard. For this reason, the lyrics data to be read must include the accent of the synthesized voice, the strength and weakness of the sound, the pitch (quality), and the read time information. The accent of the synthesized voice, the strength and weakness of the sound, the pitch (quality), and the read There was a problem that the information of the betting period must be created by analyzing melody and the like for each song.
[7] However, since cellular phones are widely used, it is possible to estimate that karaoke data is delivered using a digital cellular system. However, as described above, in the digital cellular system, since the transmission rate is low and the transmission capacity is limited, it takes a long time to deliver the karaoke data with the lyrics data for reading, and the communication fee becomes high. There was this. In order to deliver karaoke data, the user requests the song name and delivers the karaoke data for the song. There was a risk of losing interest.
[8] In addition, in the mobile telephone, a voice synthesizing means for synthesizing the voice from the read lyrics data must be provided, the mobile telephone becomes expensive, and the portable telephone can not be miniaturized due to the space of the voice synthesizing means. There was this.
[9] SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and a first object is to provide a terminal apparatus and a guide voice reproducing method, and a storage medium storing a program for executing the method, which does not require the preparation of the pitch or intonation information of the guide voice. I am doing it.
[10] In addition, the present invention provides a terminal apparatus and a guide voice reproducing method which can deliver karaoke data in a short time even at a low transmission speed, and which are not equipped with a dedicated voice synthesizing means for reproducing guide voice. It is a second object to provide a storage medium storing a program to be executed.
[1] BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is preferably applicable to karaoke apparatuses, mobile telephones, and the like, and relates to a terminal apparatus and guide audio reproducing method for carrying out karaoke with content data delivered, and a storage medium storing a program for executing the method.
[27] BRIEF DESCRIPTION OF THE DRAWINGS It is a figure which shows the structural example of the portable telephone which applied the terminal apparatus which concerns on 1st Embodiment of this invention with a base station.
[28] FIG. 2 is a diagram illustrating a detailed configuration of a voice compression synthesis unit and a database in the telephone function unit of the mobile telephone of FIG. 1.
[29] FIG. 3 is a diagram showing a flow of processing of performance data along with a functional block diagram of a processing unit of the telephone function unit shown in FIG. 1.
[30] 4 is a diagram illustrating a configuration of karaoke data used in the mobile telephone of FIG. 1.
[31] 5 is a conceptual diagram of downloading karaoke data to the mobile phone of FIG.
[32] It is a figure which shows the structural example of the karaoke apparatus which applied the terminal apparatus which concerns on 2nd Embodiment of this invention with a delivery center.
[33] FIG. 7 is a diagram illustrating a detailed configuration of the speech synthesis unit and the database in the control unit of the karaoke apparatus of FIG. 6.
[11] In order to achieve the first object, according to the first aspect of the present invention, there is provided sound data including sound data for each syllable in syllables included in a performance event string and lyrics accompanying the performance data. Provided is a terminal device to which content data is delivered.
[12] A terminal apparatus according to the first aspect of the present invention includes a sound synthesizer for reproducing a sound from the performance data, a voice synthesizer for synthesizing a guide voice based on the voice symbol data, and first reading the performance data to synthesize the voice. And a speech synthesis control section for changing the property of the guide speech synthesized by the speech synthesis section in accordance with the performance data.
[13] According to such a terminal device, the performance of the guide voice synthesized by the voice synthesis unit can be changed in accordance with the performance data by reading the performance data first and controlling the voice synthesis unit. There is no need to write information. This makes it unnecessary to create a work for analyzing the melody and the like for each piece of music and creating the accents of the synthesized voice, the strength and weakness of the sound, the pitch (quality), and the reading time information.
[14] In addition, since the pitch and intonation information of the guide voice need not be included in the delivery data, the amount of data to be delivered can be reduced. In addition, since the voice timing of the guide voice can be controlled by first reading and analyzing the performance data, the amount of data to be delivered can be further reduced.
[15] The performance data is performance data in MIDI format, and it is preferable that the voice symbol data is inserted into the performance data as an exclusive message.
[16] The terminal apparatus further includes an analysis section for analyzing the performance data of the vocal line among the performance data, and the speech synthesis control section is synthesized by the speech synthesis section by controlling the speech synthesis section according to the analysis result of the analysis section. It is desirable to change the pitch and intonation of the guide voice according to the vocal line.
[17] Preferably, the speech synthesis control unit controls the synthesis timing of the speech synthesis unit according to the analysis result of the analysis unit so that the guide speech synthesized by the speech synthesis unit is uttered before the corresponding vocal line.
[18] The speech synthesis control unit assigns the speech synthesis unit the speech parameter read from the speech database according to the speech symbol data and the analysis result of the analyzing unit so that each syllable of the guide speech synthesized by the speech synthesis unit is the speech. In addition to the symbol data, it is more preferable that the pitch and intonation of the guide voice change according to the vocal line.
[19] According to a second aspect of the present invention, in order to achieve the above-mentioned second object, it is composed of performance data composed of a performance event sequence and speech symbol data composed of speech symbols for each syllable in the lyrics accompanying the performance data. Provided is a terminal device to which content data is delivered.
[20] The terminal apparatus according to the second aspect of the present invention synthesizes a guide voice based on a telephone function unit that enables a call, a sound synthesizer that reproduces sound from the performance data, and the voice symbol data, and the above-mentioned call. And a speech synthesizer which decodes the voice data for use.
[21] According to this terminal device, the voice synthesis unit which decodes the voice data included in the mobile telephone of the digital cellular system is used to synthesize the guide voice, thereby eliminating the need for a new voice synthesis unit. This makes it possible to keep downsizing without requiring a new storage space even when the guide sound is output. In addition, since the voice synthesizer can be used as both, it is possible to suppress an increase in cost.
[22] It is preferable that the terminal apparatus further includes a speech synthesis control section for changing the property of the guide speech synthesized by the speech synthesis section in accordance with the performance data by first reading the performance data and controlling the speech synthesis section.
[23] The terminal apparatus analyzes the performance data of the vocal line among the performance data, and controls the speech synthesis unit in accordance with the analysis result of the analysis unit, thereby adjusting the pitch and intonation of the guide voice synthesized by the speech synthesis unit. It is likewise preferable to further include a speech synthesis control unit that changes in accordance with the vocal line.
[24] In order to achieve the first object, according to the third aspect of the present invention, there is provided sound data including sound data for each syllable in syllables included in a performance event sequence and accompanying lyrics. A guide audio reproduction method is provided by using a terminal device to which content data is delivered.
[25] The guide voice reproducing method according to the third aspect of the present invention reproduces a musical sound from the performance data, synthesizes a guide voice based on the voice symbol data, reads the performance data first, and then synthesizes the property of the guide voice synthesized. And varying according to the performance data.
[26] In order to achieve the first object, according to a fourth aspect of the present invention, a storage medium having a program for causing the computer to execute the guide audio reproduction method is provided.
[34] EMBODIMENT OF THE INVENTION Hereinafter, embodiment of this invention is described, referring drawings.
[35] 1 shows an example of the configuration of a mobile phone and a base station to which the terminal apparatus according to the first embodiment of the present invention is applied.
[36] In Fig. 1, 1 is a mobile phone according to the present invention, and 2 is a base station for managing each radio zone. In a digital cellular system, a small zone scheme is generally employed, and a plurality of radio zones are arranged in a service area. It is the base station 2 that manages each of these radio zones, and when the mobile phone 1 serving as the mobile station talks to the regular phone, the mobile phone 1 is connected to the exchange via the base station 2, and the general service from the exchange is performed. It will be connected to the telephone network. Details will be described later.
[37] The cellular phone 1 is generally provided with an extendable antenna 10, which is connected to the transceiver section 11. The transceiver section 11 demodulates the signal received by the antenna 10, modulates the signal to be transmitted, and supplies the signal to the antenna 10. The telephone function unit 12 has processing means for making the cellular phone 1 function as a telephone when talking to another telephone, and a voice having a CELP encoder function and decoder function for coping with high efficiency compression of the voice. The compression synthesis section 22 is provided. In this case, the voice parameters read from the database 24 are supplied to the voice compression synthesis section 22, and the voices corresponding to the voice parameters can be synthesized by using the decoder function of the voice compression synthesis section 22. FIG. In other words, the speech compression synthesis section 22 can function as speech synthesis means. The database 24 stores voice parameters of "a" through "n" and a false sound.
[38] During a call, the voice signal input from the microphone 21 is compressed and encoded with high efficiency by the encoder function of the voice compression synthesis unit 22 of the telephone function unit 12, and is modulated by the transceiver unit 11 so that the antenna 10 Is sent from. In addition, the highly efficient compression coded speech data received by the antenna 10 is demodulated by the transceiver section 11, decoded into the original speech signal by the speech compression combining section 22 of the telephone function section 12, and the speaker. It is output from the output part 20 which consists of etc. In this way, a signal is transmitted or received via the transceiver unit 11 and the telephone function unit 12 during a call.
[39] The storage means 13 is a memory in which karaoke data delivered as described later is temporarily stored. The karaoke data is composed of performance data consisting of a string of performance events of the requested song, and voice symbol data composed of voice symbols for each syllable in the lyrics accompanying the performance data. The karaoke data may also contain guide lyrics data for displaying lyrics on the display. This karaoke data is in MIDI format as shown in FIG. 4, and the audio symbol data of the lyrics is inserted in the MIDI data as an exclusive message as shown in FIG. For this reason, the data amount of one piece of karaoke data can be made into a small amount of data, and even a digital cellular system with a low bit rate transfer rate can transmit one piece of karaoke data in a short time.
[40] The data separating unit 14 has a built-in MIDI decoder and analyzes the MIDI data read out from the storage unit 13, and separates it into performance data and voice symbol data. The separated performance data is supplied to a music synthesizer 16 composed of a sequencer and a MIDI sound source through a buffer memory (Buff) 15 that operates as a delay circuit. The separated voice symbol data is supplied to the telephone function unit 12 together with the performance data. In the telephone function unit 12, the guide voice synthesized on the basis of the voice symbol data is output from the voice compression synthesis unit 22. The guide voice is guided by changing the lyrics of the song in the karaoke to the guide lyrics image displayed on the display. The guide voice is synthesized in accordance with the progress of the karaoke music reproduced by the music synthesizer 16, and is outputted from the output unit 20. Is output. Therefore, the output timing of the guide voice is before the timing of singing the lyrics, and the lyrics having the length of the predetermined phrase are synthesized and output as the guide voice. This guide voice is synthesized by adding accents or intonations at fast tempo and melody according to the performance data.
[41] In this way, in order to control the timing of outputting the guide voice, or to add melody, accent or intonation to the guide voice, the processing unit in the telephone function unit 12 analyzes the performance data of the vocal line (segment of the vocal part) in the performance data. Doing. For example, by analyzing the change (melody) of the key of the performance data of the vocal line, the change form of the pitch of the guide voice is controlled, and the velocity information and the pronunciation length of the vocal line reflecting the musical notes such as slur and staccato By analyzing the (gate time) information, the intonation and accent of the guide voice are controlled. In the case of a song in which the karaoke data is duet, it is analyzed whether the phrase is a female part or a male part from the change of the key of the performance data of the vocal line, and the pitch is set so that the guide voice of the phrase becomes the female voice or the male voice. You may make a decision.
[42] In addition, the voice symbol data supplied to the telephone function unit 12 is supplied to the database 24 so that the voice indicated by the voice symbol data is synthesized by the voice compression synthesis unit 22 for each syllable from the database 24. Voice parameters are read. This speech parameter is supplied to the speech compression synthesis section 22. Since this audio parameter is read out from the database 24 by controlling by the analysis result of the performance data of the vocal line mentioned above, it reflects the melody, velocity, and pronunciation length of a vocal line. As a result, it is possible to change and control the pitch, accent and intonation of the guide voice synthesized by the voice compression synthesis unit 22 in accordance with the vocal line.
[43] As described above, the guide voice is first read and analyzed for the performance data of the corresponding portion, and then outputted before the musical sound is reproduced based on the performance data portion. In other words, reproduction of the musical sound based on the performance data is performed later than the guide sound. This delay is realized by the buffer memory 15. The performance data delayed by the buffer memory 15 for a predetermined time is supplied to the music synthesizer 16, and the sound is reproduced. As a result, the voice compression synthesizer 22 The guide voice synthesized at is output from the output unit 20 in advance of the musical sound reproduced by the music synthesizer 16.
[44] On the other hand, the musical sound synthesizer 16 is composed of a sequencer and a MIDI sound source, and the musical sound reproduced by the musical sound synthesizer 16 is sent to the effect unit 17 to add an effect. The sound to which the effect is added is synthesized in the synthesizer 18 with the synthesized guide voice. An effect is given to this guide voice by the effect unit 23 before being combined with the musical sound. The sound synthesized in the synthesizer 18 and the guide voice are amplified by the amplifier 19 and output from the output unit 20. In the effect units 17 and 23, for example, positioning control according to the number of speakers of the output unit 20 is performed. Moreover, you may add effects, such as reverberation and a chorus. In addition, since the database 24 stores voice parameters for synthesizing a representative machine-pronounced voice, the synthesized guide voice may be corrected by an equalizer. Further, the volume of the guide voice may be variable. In this way, the volume of the guide voice according to the skill of the singer can be reduced.
[45] Next, FIG. 2 shows a detailed configuration of the voice compression synthesis section 22 and the database 24 in the telephone function section 12 of the mobile phone 1 of FIG.
[46] The speech compression synthesis section 22 shown in FIG. 2 includes a representative CELP decoder which decodes speech data obtained by highly efficient compression encoding speech information. Although not shown in the speech compression synthesis section 22, a CELP type encoder capable of compressing and encoding speech information with high efficiency is also provided.
[47] Principle of speech synthesis is that the speech characteristics are the pitch L of the original sound generated from the vocal cords or the noise component (called the original speech characteristic parameter J), and the sound of the throat and mouth when passing through. It can be expressed by the vocal tract propagation characteristic or the radiation characteristic on the lips (called "the saint characteristic parameter"). In other words, the speech synthesis model may be represented by a vocal cord model that generates the original voice and a vocal model that is dependent on the vocal cord model.
[48] The CELP decoder in the speech compression synthesis section 22 shown in Fig. 2 decodes the speech coded speech data into the original speech by speech synthesis based on this speech synthesis model.
[49] In Fig. 2, the compressed speech data for each frame input to the speech compression synthesis section 22 is separated into the speech parameters of the index I, the pitch L and the reflection coefficient γ in the data processing section 30. The parameters of the pitch L are set to the short term oscillator 32, the parameters of the index I to the codebook 31, and the parameters of the reflection coefficient γ to the neck approximation filter 34. Is distributed. On the other hand, the codebook 31 has the same content as the codebook for the original sound in the encoder, and the content is recorded in the ROM (Read Only Memory).
[50] Based on the parameters of the pitch L, the short term oscillation unit 32 generates a decoded signal of the voice of the pitch L, and is supplied to the wave form reproducing unit 33. The data of the code vector indicated by the index I read from the code book 31 is supplied to the waveform reproducing section 33. The data is synthesized by the decoded signal of the speech of the pitch L. In the reproducing section 33, the synthesized waveform is reproduced. The synthesized circular waveform output from the circular waveform reproducing section 33 is a waveform similar to the waveform generated by the vibration of the human vocal cords, and the neck approximation filter 34 whose filter coefficient is controlled by the parameter of the reflection coefficient γ. ) Is filtered, resulting in speech synthesis. The neck approximation filter 34 reproduces the transfer function of the human neck or mouth. The neck approximation filter 34 accumulates the reflection coefficient Supplied from the data processing unit 30 in advance, and supplies it to each filter when necessary. have. The synthesized voice output from the neck approximation filter 34 is supplied to the spectral filter 35, whereby unnaturalness as voice is eliminated and output. As a result, the highly efficient compression-coded compressed speech data as a call signal is decoded by the speech compression synthesis section 22 to be output.
[51] In addition, the speech symbol data separated by the data separator 14 is supplied to the speech database 40 of the database 24, and pitch parameters, waveform selection parameters, and the like for synthesizing the guide speech indicated by the supplied speech symbol data and The reflection coefficient parameter is output from the speech database 40. The pitch parameter of the output pitch Lg is supplied to the short term oscillator 32, and a decoded signal of the voice of the pitch Lg is generated from the short term oscillator 32 and supplied to the wave form reproducing unit 33. The waveform selection parameter is supplied to the waveform database 41 so that waveform data giving the holy species is read from the waveform database 41 and output to the waveform reproducing section 33. In the circular waveform reproducing section 33, the decoded signal of the pitch Lg and the waveform data giving the holy species are synthesized, and the synthesized circular waveform is reproduced. The synthesized waveform output from this wave form reproducing section 33 is a filter coefficient by a parameter of the reflection coefficient γ g read out from the reflection coefficient changing database 42 to which the reflection supply parameter is supplied from the audio database 40. Is filtered in the neck approximation filter 34 which is controlled so that the guide voice is speech synthesized. The synthesized voice output from the neck approximation filter 34 is supplied to the spectral filter 35 so that unnaturalness as voice is removed and output as a guide voice.
[52] Here, the control signal is supplied to the database 24. The control signal is a signal for controlling the pitch Lg and the variation form of the pitch Lg of the guide voice and controlling the innation and accent of the guide voice. The control signal is information of an analysis result obtained by analyzing the performance data of the vocal line in the performance data by the processing unit built in the telephone function unit 12. If the oscillation frequency of the short term oscillation unit 32 is changed by controlling the pitch parameter Lg by the control signal, the voice synthesized guide voice can be either female voice or male voice. In addition, by changing the waveform data read from the waveform database 41, the species of the guide voice can be changed. In addition, by changing the parameter of the reflection coefficient γg read out from the reflection coefficient changing data phase 42, the innation and the accent of the guide voice can be changed.
[53] In this case, since the control signal is created by analyzing the performance data of the vocal line as described above, the pitch, innation and accent of the guide voice can be changed according to the melody of the vocal line. In this way, the user can understand how to sing with any key by listening to the guide voice before singing.
[54] The database 24 is supplied with time information (Time), which is information indicating the utterance timing and tempo of the guide voice, and a predetermined waveform is read from the waveform database 41 according to the time information (Time). The parameter of predetermined reflection coefficient (gamma) g is read from the reflection coefficient change database 42. As shown in FIG. The time information (Time) is output at the timing before the timing at which the lyrics to guide voice are sung, by analyzing the performance data of the vocal line. In addition, the length of each syllable of the guide voice is controlled by the time information Time, and the speed at which the guide voice is output is controlled.
[55] As described above, the performance data of the vocal line is interpreted by the processing unit in the telephone function unit 12. Although this analysis process is performed by a processor executing an analysis program, the structure which showed this analysis process in hardware is shown in FIG.
[56] In FIG. 3, MIDI data which is delivered karaoke data is supplied from the transceiver section 11 to the storage means 13. MIDI data read out from the storage means 13 by the data separator 14 is interpreted by the data separator 14 having a MIDI decoder function and separated into performance data and voice symbol data. In this MIDI data, as shown in Fig. 4, voice symbol data, which is a voice symbol string for a guide voice, is inserted as an exclusive message in the MIDI data. Exclusive messages appear as message parts interposed between status bytes " F0 " and " F7 " in the MIDI data as shown in FIG. This exclusive message is composed of a voice symbol string of guide voices for each phrase, and may include timing information for uttering the guide voices of the phrases.
[57] The voice symbol data separated by the data separating unit 14 is supplied to the height / speed determining unit 36 of the guide voice, so that the pitch or speed (tempo) of the guide voice is determined. In this case, when the user specifies pitch, the guide voice can be a female voice or a male voice. In addition, pitch information and tempo information, and intonation information and accent information are supplied from the vocal line analyzer 38 to the height / speed determination unit 36 of the guide voice through the switch SW. From the height and velocity determination unit 36 of the guide voice, the control signal according to the supplied various information is output together with the voice symbol data in order to be a guide voice of a specified pitch and guide voice along the vocal line. The control signal is a signal for controlling the pitch, speed (tempo), intonation, and accent of the guide voice.
[58] On the other hand, when the switch SW is off and the height is not specified, the guide voice becomes the default pitch. If the melody of the guide voice is tuned along the vocal line, the switch SW may be turned off when it becomes unnatural. In this case, it becomes a monotonous guide voice.
[59] The performance data separated by the data separation unit 14 is supplied to the vocal line analyzer 38 in the telephone function unit 12 so that the performance data of the vocal lines is analyzed. In addition, the separated performance data is delayed by the buffer memory 15 and supplied to the music synthesizer 16. As a result, the music sound based on the performance data is reproduced with a delay than the guide voice, and is read and analyzed by the vocal line analyzer 38 first. The vocal line analyzer 38 analyzes velocity information or envelope information in which a change (melody) of the vocal line's key and a musical note such as slaw or staccato are reflected. In addition, duration information and gate time information are also analyzed. The melody information obtained as a result of the analysis is supplied from the vocal line analysis unit 38 to the height / speed determination unit 36 of the guide voice as pitch control information. In addition, the accent control information and intonation control information obtained by analyzing the velocity information and the envelope information are also supplied to the height / speed determination unit 36 of the guide voice. The voice timing information of the guide voice and the tempo information of the guide voice obtained by analyzing the duration information and the gate time information are supplied to the height / speed determination unit 36 of the guide voice.
[60] On the other hand, the vocal line analysis unit 38 also analyzes whether the vocal line to be analyzed is a female part or a male part in the case of a duet song, and the pitch information according to the analysis result is determined by the height / speed determination unit 36 of the guide voice. Supply to.
[61] As a result, the pitch of the guide voice and the length of each syllable are controlled according to the melody of the vocal line. In the case of the duet music, the guide voice by the female voice is output before the female voice, and the guide voice by the male voice is output before the male music. The guide voice is output at a timing corresponding to the voice timing information of the guide voice supplied from the vocal line analyzer 38, and the speed of the guide voice is a speed according to the tempo information. On the other hand, the voice timing information and tempo information of the guide voice are output from the height / speed determination unit 36 of the guide voice as time information (Time). When the voice symbol data includes timing information for uttering the guide voice, the guide voice is uttered based on the timing information.
[62] By the way, the control signal output from the height / speed determination unit 36 of the guide voice is output to the database 24 via the interpolator 37. The interpolator 37 prevents the pitch from changing unnaturally when the pitch of the guide voice is changed in accordance with the melody of the vocal line. In addition, the speed of change of the pitch of the guide voice is dynamically changed in accordance with the speed of the melody of the vocal line. As a result, the guide voice is output as a smooth voice. On the other hand, the buffer memory 15 is provided for synchronizing the timing at which the vocal line is reproduced with the guide voice, and the above-mentioned voice timing information of the guide voice is voice timing information in consideration of the delay time of the buffer memory 15. .
[63] By the way, the mobile telephone 1 to which the terminal apparatus which concerns on 1st Embodiment of this invention is applied can download karaoke data from the outside.
[64] 5 is a conceptual diagram of downloading karaoke data to the mobile phone 1a and the mobile phone 1b having the same configuration as the mobile phone 1.
[65] In general, a cellular system in a cellular phone employs a small zone system, and a plurality of radio zones are arranged in a service area. It is the base station installed in each wireless zone that manages each radio zone, and when the mobile telephone which is a mobile station calls with a normal telephone, it is connected to the mobile switching center through the base station which manages the wireless zone to which the mobile telephone belongs. The exchange will be connected to the public telephone network. As a result, the cellular phone can be connected to the base station for managing each radio zone via a wireless line, so that the cellular phone can make a call with another telephone. When talking to a cellular phone belonging to another radio zone, the mobile switching center is connected to the mobile switching center through a base station managing the radio zone to which the cellular phone belongs. Can be done.
[66] An example of such a cellular system is shown in Fig. 5, wherein the cellular phone 1a belongs to a wireless zone managed by the base station 2c of the base stations 2a to 2d, and the cellular phone 1b is a base station ( The case belonging to 2a) is shown. The cellular phone 1a and the base station 2c are connected by a wireless line, and an up signal when making a call or registering a location is received by the base station 2c and processed. Although the base station to be managed is the base station 2a, the same applies to the mobile phone 1b. Although the base stations 2a to 2d each manage different radio zones, the edges of the radio zones may overlap each other. The base stations 2a to 2d are connected to the mobile switching center 3 through a multiplexed line, and a plurality of mobile switching centers 3 collect lines at the gate switching center 4 and are connected to the general switching center 5a. do. A plurality of gate switching stations 4 are connected to each other in a relay transmission path. The general telephone switching centers 5a, 5b and 5c are provided in each region, and the general telephone switching centers 5a, 5b and 5c are connected to each other in the relay transmission path. A large number of general telephones are connected to each of the general telephone switching centers 5a, 5b, 5c ..., for example, and a delivery center 6 is connected to the general telephone switching center 5b.
[67] Occasionally new songs are added to the delivery center 6 to store a large number of karaoke data. In the present embodiment of the invention, for example, mobile phones 1a and 1b from the delivery center 6 connected to a general telephone network. Karaoke data can be downloaded. Here, when the cellular phone 1a downloads the karaoke data, the cellular phone 1a transmits the telephone number of the delivery center 6. By this, the cell phone 1a-the base station 2c-the mobile switching center 3-the gate switching center 4-the general switching center 5a-the general switching center 5b-the delivery center 6 are delivered by the route. The center 6 and the cellular phone 1a are connected. Subsequently, when the mobile phone 1a operates a tenkey, a jog dial, or the like along the guidance displayed on the display unit, the mobile phone 1a can request and download karaoke data of a desired song name. The karaoke data in this case contains voice symbol data of the guide voice. Similarly, the cellular phone 1b can request and download karaoke data of a desired song name. Alternatively, the delivery center 6 may be connected to the Internet to download karaoke data from the delivery center 6 via the Internet.
[68] On the other hand, when karaoke is performed in the mobile phone 1 shown in FIG. 1, the song sound input from the microphone 21 is also output from the output unit 20. In this case, the mobile phone 1 can talk hands-free, and when karaoke is hands-free, the output sound from the output unit 20 may be input from the microphone 21 to cause howling. Therefore, when the cellular phone 1 can talk hands-free, an echo canceller circuit is provided to prevent howling. In addition, the output from the output unit 20 may be transmitted as a weak radio wave by the FM modulator, and may be received and output by an FM receiver installed indoors or in a vehicle. In this case as well, a howling may be caused. Therefore, an echo canceler circuit should be provided.
[69] In the case where karaoke is being performed, the transmitter of the mobile phone 1 is not used except for the request, so that the duration of the battery can be improved by turning off the power supply 1 supplied to the transmitter except for the request. do.
[70] Next, the structural example of 2nd Embodiment which applied the terminal apparatus of this invention to the karaoke apparatus is shown in FIG. 6 with a delivery center.
[71] This embodiment differs from the first embodiment basically only in a communication function and a display function. In other words, the configuration of the transmission / reception function unit 11, the telephone function unit 12, the modem 111, and the control unit 112 of the first embodiment is different, and the display unit 126 is added. Since it is different from the 1st Embodiment applied to and the other component is functionally the same, the same code | symbol is attached | subjected and the detailed description is abbreviate | omitted.
[72] In Fig. 6, 100 is a karaoke apparatus to which the terminal apparatus according to the second embodiment of the present invention is applied, and the karaoke apparatus 100 is capable of downloading karaoke data from the delivery center 6. The karaoke apparatus 100 and the delivery center 6 are connected by the communication line, and the communication line is comprised by the telephone line. The karaoke apparatus 100 is equipped with the modem 111, and makes it possible to download desired karaoke data from the delivery center 6 via the modem 111. FIG. The modem 111 demodulates the received signal, modulates the signal to be transmitted, and transmits it to the communication line. The control part 112 is a control part which is equipped with the display control part 125 and the speech synthesis part 122, and controls each part of the karaoke apparatus 100. As shown in FIG. In the case of speech synthesis by the controller 112, the speech parameters read from the database 24 can be supplied to the speech synthesizer 122 to synthesize speech according to the speech parameters. The database 24 stores voice parameters of "a" through "n" and a false sound.
[73] The storage means 13 is a memory in which delivered karaoke data is stored similarly to the first embodiment. In the present embodiment, the karaoke data includes performance data consisting of a string of performance events of the requested song, voice symbol data composed of voice symbols for each syllable in the lyrics accompanying the performance data, and the display unit 126. Is composed of guide lyrics display data for displaying the guide lyrics. The guide lyrics display data is supplied from the modem 111 to the control unit 112. The guide lyrics display data are sequentially supplied from the control unit 112 to the display unit 126 when the performance data is played so that the guide lyrics are displayed on the display unit 126. At this time, the background image data suitable for the genre of the performance data is read out from the mass storage means (not shown) and displayed on the display unit 126 together with the guide lyrics.
[74] Karaoke data excluding the guide lyrics display data is data in MIDI format as shown in FIG. 4, and the audio symbol data of lyrics is inserted into the MIDI data as an exclusive message as shown in FIG. It is. For this reason, the data amount of one piece of karaoke data can be set to a small amount of data except for the guide lyrics display data, so that one piece of karaoke data can be transmitted in a short time.
[75] In this embodiment, the voice symbol data separated by the data separating unit 14 is supplied to the control unit 112 together with the performance data. The control unit 112 outputs the guide voice synthesized based on the voice symbol data from the voice synthesis unit 122. The guide voice is for guiding the guide lyrics displayed on the display unit 126 when the song is sung in karaoke, and is synthesized in accordance with the progress of the karaoke music reproduced by the music synthesizer 16 to output the output unit 20. Is output from
[76] When karaoke is performed in the karaoke apparatus 100 shown in FIG. 6, the song sound input from the microphone 21 is also output from the output unit 20.
[77] Next, FIG. 7 shows a detailed configuration of the speech synthesis unit 122 and the database 24 in the control unit 112 of the karaoke apparatus 100 according to the second embodiment of the present invention.
[78] The speech synthesis unit 122 shown in FIG. 7 does not include an encoder, unlike the speech compression synthesis unit 22 of the telephone function unit 12 of the mobile phone 1 to which the terminal device of the first embodiment is applied. . Other configurations are the same as those of the speech compression synthesis section 22, and thus description thereof is omitted.
[79] In addition, similar to the first embodiment, the performance data of the vocal line is interpreted by the processing unit in the control unit 112, and this analysis processing is performed by executing the analysis program, but the flow of this analysis processing is shown in FIG. Since it is the same as that of the mobile telephone 1 of 1st Embodiment mentioned, the description is abbreviate | omitted.
[80] Here, a description will be made when the karaoke apparatus 100 downloads karaoke data. The karaoke apparatus 100 accesses the delivery center 6 via the modem 111. In this way, the karaoke apparatus 100 and the delivery center 6 are connected. Next, when the karaoke apparatus 100 operates the input means which is not shown in accordance with the guidance displayed on the display part 126, the karaoke apparatus 100 can request and download karaoke data of a desired music name. In this case, the karaoke data includes voice symbol data of the guide voice and is accompanied by guide lyrics display data. On the other hand, the karaoke data may be downloaded from the delivery center 6 by connecting the delivery center 6 to the Internet and accessing the Internet.
[81] On the other hand, the program is installed in an electronic apparatus such as a karaoke apparatus, a mobile telephone, a personal computer, and the like by a storage medium on which program code of software for realizing the functions of the above-described embodiments is installed, and the computer (or CPU) of the electronic apparatus is installed. Needless to say, the execution of the program also achieves the object of the present invention.
[82] In this case, the program code itself installed in the electronic apparatus using the storage medium realizes the novel function of the present invention, and the storage medium storing the program code constitutes the present invention.
[83] As a storage medium for recording the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, nonvolatile memory card, ROM, or the like can be used. In addition, the program code may be supplied from the server computer via the communication network.
[84] In addition, by executing the program code read out by the computer, not only the functions of the above-described embodiments are realized but also the OS or the like running on the computer performs part or all of the actual processing based on the instruction of the program code, It goes without saying that the process includes the case where the functions of the above-described embodiments are realized. Furthermore, after the program code read out from the storage medium is written into a memory provided in a function expansion board inserted in a karaoke device or a personal computer or the like or a function expansion unit connected thereto, the function is based on an instruction of the program code. It goes without saying that the CPU or the like provided in the expansion board or the function expansion unit performs part or all of the actual processing, and the processing includes the case where the functions of the above-described embodiments are realized.
[85] As described above, the terminal device of the present invention can be applied to a karaoke device having a communication function, and can be applied to a mobile phone such as a mobile phone or a car phone having a karaoke function. In addition, in an electronic device having a karaoke function, the terminal device of the present invention can be applied by connecting a modem, a mobile phone, or the like to have a communication function.
权利要求:
Claims (13)
[1" claim-type="Currently amended] In the terminal apparatus to which the performance data which consists of the performance event row which consists of the performance event row, and the audio symbol data which consists of the voice symbol for every syllable in the lyrics accompanying this performance data is delivered,
A music sound synthesizer for reproducing music from the performance data;
A speech synthesizer for synthesizing a guide speech based on the speech symbol data; And
And a voice synthesis control unit for changing the property of the guide voice synthesized by the speech synthesis unit according to the performance data by first reading the performance data and controlling the speech synthesis unit.
[2" claim-type="Currently amended] The terminal device according to claim 1, wherein the performance data is performance data in MIDI format, and the voice symbol data is an exclusive message and is inserted into the performance data.
[3" claim-type="Currently amended] The speech synthesis section of claim 1, further comprising an analysis section for analyzing performance data of vocal lines among the performance data, wherein the speech synthesis control section controls the speech synthesis section in accordance with an analysis result of the analysis section. And a pitch and intonation of the synthesized guide voice according to the vocal line.
[4" claim-type="Currently amended] The voice synthesis control unit of claim 3, wherein the voice synthesis control unit controls the synthesis timing of the speech synthesis unit according to the analysis result of the analysis unit so that the guide voice synthesized by the speech synthesis unit is uttered before the corresponding vocal line. Terminal device.
[5" claim-type="Currently amended] The speech synthesis unit according to claim 4, further comprising a speech database in which speech parameters are stored, and wherein the speech synthesis control unit reads the speech parameters read from the speech database according to analysis results of the speech symbol data and the analyzer. And giving each syllable of the guide voice synthesized by the voice synthesizer in accordance with the voice symbol data, and also causing the pitch and innation of the guide voice to change in accordance with the vocal line.
[6" claim-type="Currently amended] In the terminal apparatus to which the performance data which consists of the performance event row which consists of the performance event row, and the audio symbol data which consists of the voice symbol for every syllable in the lyrics accompanying this performance data is delivered,
A telephone function to enable a call;
A music sound synthesizer for reproducing music from the performance data; And
And a voice synthesizer configured to synthesize a guide voice based on the voice symbol data and to decode the call voice data.
[7" claim-type="Currently amended] 7. The terminal apparatus according to claim 6, further comprising a speech synthesis control section for changing the property of the guide speech synthesized by the speech synthesis section in accordance with the performance data by first reading the performance data to control the speech synthesis section.
[8" claim-type="Currently amended] The terminal device according to claim 6, wherein the performance data is performance data in MIDI format, and the voice symbol data is included in the performance data as an exclusive message.
[9" claim-type="Currently amended] 7. The pitch of the guide voice synthesized by the speech synthesizer according to claim 6, wherein the analysis unit analyzes the performance data of the vocal line among the performance data, and the voice synthesizer is controlled according to the analysis result of the analysis unit. And a speech synthesizing control unit for changing an innation according to the vocal line.
[10" claim-type="Currently amended] The voice synthesis control unit of claim 9, wherein the voice synthesis control unit controls the synthesis timing of the voice synthesis unit according to an analysis result of the analysis unit so that the guide voice synthesized by the voice synthesis unit is uttered before the corresponding vocal line. Terminal device.
[11" claim-type="Currently amended] The guide speech synthesized by the speech synthesis unit according to claim 9, wherein the speech synthesis control unit assigns the speech synthesis unit to a speech parameter read from a speech database according to the speech symbol data and the analysis result of the analysis unit. And a syllable of each syllable according to the voice symbol data, and a pitch and tonation of a guide voice to change according to the vocal line.
[12" claim-type="Currently amended] In the guide audio reproduction method for a terminal device to which content data consisting of sound data consisting of sound data for each syllable in the lyrics accompanied by the performance event string and the speech accompanying the performance data is delivered,
Reproducing a musical sound with the performance data;
Synthesizing a guide speech based on the speech symbol data; And
And first reading the playing data to change the property of the synthesized guide voice according to the playing data.
[13" claim-type="Currently amended] A guide audio reproducing method is provided to a computer for a terminal device to which content data consisting of performance data consisting of performance data consisting of performance event columns and speech symbol data of every syllable in the lyrics accompanying the performance data is delivered. In a storage medium in which a program for execution is put, the program includes:
A music reproducing module for reproducing music from the performance data;
A guide speech synthesis module for synthesizing a guide speech based on the speech symbol data; And
And a guide voice variable module for first reading the performance data to change the property of the synthesized guide voice according to the performance data.
类似技术:
公开号 | 公开日 | 专利标题
JP6122404B2|2017-04-26|System and method for portable speech synthesis
US5734119A|1998-03-31|Method for streaming transmission of compressed music
US7058428B2|2006-06-06|Portable phone equipped with composing function
CN1307614C|2007-03-28|Method and arrangement for synthesizing speech
US7099704B2|2006-08-29|Music player applicable to portable telephone terminal
US7514624B2|2009-04-07|Portable telephony apparatus with music tone generator
US5717823A|1998-02-10|Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders
JP4758044B2|2011-08-24|Method for broadcasting an audio program using musical instrument digital interface | data
EP1255243B1|2010-07-14|Portable telephone and music reproducing method
TWI250508B|2006-03-01|Voice/music piece reproduction apparatus and method
US5943648A|1999-08-24|Speech signal distribution system providing supplemental parameter associated data
KR101274961B1|2013-06-13|music contents production system using client device.
ES2265442T3|2007-02-16|Apparatus for the expansion of the band width of a vocal signal.
US6525256B2|2003-02-25|Method of compressing a midi file
EP1671317B1|2018-12-12|A method and a device for source coding
US6442517B1|2002-08-27|Methods and system for encoding an audio sequence with synchronized data and outputting the same
EP0714089B1|2002-07-17|Code-excited linear predictive coder and decoder, and method thereof
US5704007A|1997-12-30|Utilization of multiple voice sources in a speech synthesizer
KR100536965B1|2005-12-14|Telephone terminal apparatus and communication method
RU2333546C2|2008-09-10|Voice modulation device and technique
US7025657B2|2006-04-11|Electronic toy and control method therefor
US5889223A|1999-03-30|Karaoke apparatus converting gender of singing voice to match octave of song
US5860065A|1999-01-12|Apparatus and method for automatically providing background music for a card message recording system
US20060165240A1|2006-07-27|Methods and apparatus for use in sound modification
US8229738B2|2012-07-24|Method for differentiated digital voice and music processing, noise filtering, creation of special effects and device for carrying out said method
同族专利:
公开号 | 公开日
CN100461262C|2009-02-11|
CN1436345A|2003-08-13|
JP2001356784A|2001-12-26|
WO2001097209A1|2001-12-20|
TW529018B|2003-04-21|
HK1054460A1|2003-11-28|
KR100530916B1|2005-11-23|
AU6424001A|2001-12-24|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
法律状态:
2000-06-12|Priority to JPJP-P-2000-00175358
2000-06-12|Priority to JP2000175358A
2001-06-11|Application filed by 야마하 가부시키가이샤
2001-06-11|Priority to PCT/JP2001/004911
2003-02-05|Publication of KR20030010696A
2005-11-23|Application granted
2005-11-23|Publication of KR100530916B1
优先权:
申请号 | 申请日 | 专利标题
JPJP-P-2000-00175358|2000-06-12|
JP2000175358A|JP2001356784A|2000-06-12|2000-06-12|Terminal device|
PCT/JP2001/004911|WO2001097209A1|2000-06-12|2001-06-11|Terminal device, guide voice reproducing method and storage medium|
[返回顶部]